Lab 7: Event Slection Otimization & Jackknife
Megan Miyasaki
Partner: Kuan Lee
For this lab, we will again work with the Higgs Classification data that we worked with in Lab 5, and apply whay we learned from the past two Labs 5 and 6, and use them to help us with this double Lab of 7 and 8. As a recall, for lab 5 we worked with training datasets from one of the two pT-range folders. In each folder, there are 2 files, each containing 100k jets. The signal dataset is labeled as “higgs” and the background dataset is labeled as “qcd.” Between me and my lab partener we took two different sets of Higgs Classification data, I worked with the lowPT sample points 250-500, and analyzed:
- Do all features provide discrimination power between signal and background?
- Are there correlations among these features?
- Compute expected discovery sensitivity by normalizing each sample appropriately.
- Develop a plan to optimize the discovery sensitivity by applying selections to these features.
Lab 6, was a signal injection test, sadly we could not work with the 'real world' data, and therefore we built ur own background and injected our own signal strengths into the background data we made.
For this Lab, Lab 7 we will be doing Jackknife test to hunt for very subtle effects. we will:
- Identify a concern
- Split the data into two based on the concern
- Analyze the data set separatly and difference.
- A statistically significatn difference mean you have a problem, no differene means you can check off the worry.
The first things we will do is make a stacked histogram plot for the feature variable mass, remember mass:
‘mass’ ~ m
The Invariant mass; if the two photons originated from a decaying Higgs boson, their invariant mass can be identified with the mass of the decaying Higgs boson; that is, a Higgs mass of 126GeV
I wil downlaod the data to read and analyze with the code below and create a stacked histogram of the variable mass:
% downloading and importing the data for problem 1:
h5disp("higgs_100000_pt_250_500.h5");
HDF5 higgs_100000_pt_250_500.h5
Group '/'
Dataset 'higgs_100000_pt_250_500'
Size: 14x100000
MaxSize: 14x100000
Datatype: H5T_IEEE_F64LE (double)
ChunkSize: []
Filters: none
FillValue: 0.000000
signal=h5read("higgs_100000_pt_250_500.h5", '/higgs_100000_pt_250_500');
h5disp("qcd_100000_pt_250_500.h5");
HDF5 qcd_100000_pt_250_500.h5
Group '/'
Dataset 'qcd_100000_pt_250_500'
Size: 14x100000
MaxSize: 14x100000
Datatype: H5T_IEEE_F64LE (double)
ChunkSize: []
Filters: none
FillValue: 0.000000
background=h5read("qcd_100000_pt_250_500.h5", '/qcd_100000_pt_250_500');
h1=histogram(mass,'Normalization','probability');
h2=histogram(masss,'Normalization','probability');
legend('background','signal')
title('Stacked Histogram: Distribution of Mass')
From this graph we see that background as a Poisson distribtuion with a slight tail to the right, and the signal to be a sharp exponential growth.
From past labs we know that the probability density function for the given values x and λ is:
Since I do not know much about my background data aside from the fact it looks a bit poisson distributed, I will use the fitdist functionin matlab to learn more.
poisson=fitdist(mass','Poisson') % gives me the mean of the background.
poisson =
PoissonDistribution
Poisson distribution
lambda = 97.738 [97.6768, 97.7993]
I will now find how significant the signal is away from the background mean, by treating the distribution as a gaussian or normal distribution, since the signal should be centered around the true mass of the higgs boson, 125.34GeV, we will find the sigma that value is away from the mean of our background to find the significance.
The code is finding the x-value for where we are 5 sigma away from the mean:
prob=cdf(poisson,125) %125 is the true mass of a higgs boson
sigma=icdf('Normal',prob,0,1)
With this I see that the probability background of a poisson distribution is 0.0.9966 and the significance away the signal is 2.7 sigma away from the mean of the poisson background.
The last thing I want to do is compare the number of
to the number of my findings N_mass=20000; %background, qcd
N_masss=100; %higgs, signal
We find the
= 0.7071, which is not equivalent to our probability or sigma values that we found earlier, so we should need to try and identify a good mass cut to optimize the expected significance. Upon further review from my ta, these values should be equivalent and close to one, making cuts would only impromve the expected significance, once the background and uncessary signal is cut. Mass Cuts to Optimize the expected significance
I want to find cuts that increase the SNR by removing the background (mostly) to do that, I will remove all background data that does contain the majority of the signal data.
When I look at the graph, the mass of the Higgs Boson is 125, so I would want to cut the background of values lower than 120 and higher than 130 because they are close to the true mass.
mass(mass<=120)=[]; %less than
mass(mass>130)=[]; %greater than
N_mass=length(mass)*20000; %background, qcd
masss(masss<=120)=[]; %less than
masss(masss>130)=[]; %greater than
N_masss=length(masss)*100; %higgs, signal
h3=histogram(mass,50,'Normalization','probability');
h4=histogram(masss,50,'Normalization','probability');
legend('background','signal')
title('Cut 1: Distribution of Mass')
From the plot we see that the background still having a slight effect and we can cut even more from both, and the new expected significance by taking the ratio of
= 766.0922. I will now see what happens if I cut even more data away. mass(mass<=123)=[]; %less than
mass(mass>127)=[]; %greater than
N_mass=length(mass)*20000; %background, qcd
masss(masss<=123)=[]; %less than
masss(masss>127)=[]; %greater than
N_masss=length(masss)*100;%higgs, signal
h3=histogram(mass,50,'Normalization','probability');
h4=histogram(masss,50,'Normalization','probability');
legend('background','signal')
title('Cut 2: Distribution of Mass')
Again, from the plot we see that the background is still effecting the true signal of 125, and the new expected significance by taking the ratio of
= 910. I will now do one more cut to see if it will affect the expected significance. mass(mass<=124)=[]; %less than
mass(mass>126)=[]; %greater than
N_mass=length(mass)*20000; %background, qcd
masss(masss<=124)=[]; %less than
masss(masss>126)=[]; %greater than
N_masss=length(masss)*100;%higgs, signal
h3=histogram(mass,50,'Normalization','probability');
h4=histogram(masss,50,'Normalization','probability');
legend('background','signal')
title('Cut 3: Distribution of Mass')
Again, from the plot we see that the background is hidden behind the signal, and the new expected significance by taking the ratio of
= 876, which is a lower ratio as the second cut meaning I have cut too much. From this I know that the cuts that maximize the data are 123 as a low and 127 as a high. Stacked Histogram Plots & Optimize Event Selections (if necessary) for rest of the features
Here I will make plot of the rest of the features and see if another feature is discriminative as the mass feature, in general we are seeing if they have an equal or better signifcance after feature cuts. As a reminder the rest of the features are:
‘pt’ ~ 
The Higgs boson transverse-momentum. Transverse plane, is the xy plane with phi as the angel of measurement. It is calculated from transversal energy, delivered on calorimeters with units GeV. pT,1 and pT,2, the transverse momenta of the two sujet merged in the final step of clustering.
‘eta’ ~ η
The pseudorapidity is a commonly used spatial coordinate describing the angle of a particle relative to the beam axis. It is defined as. where is the angle between the particle three-momentum and the positive direction of the beam axis.
‘phi’ ~ϕ
The azimuthal angle, which is used for the jet azimuthal angle correlations in the production of Higgs boson pair of two jets at hardon colliders.
Subjettiness ~ 
N-subjettiness variables are defend by clustering the constituents of a jet with the exclusive ktalgorithem and requiring that N subjets are found. Given N subjet axes in a fat jet, N – subjettiness,
is given by
Is the angular separation between constituent k and candidate subjet N
is a normalization factor given by: 
for AK8 clustering.
Also, ‘t1’,’t2’,’t3’ equate to
, ‘t21’ and ‘t32’ are the Subjettiness ratios ‘ee2’ ~e2 & ‘ee3’ ~e3
The 2-point ECF ratio or the 3-point ECF ratio, with ECF = energy correlation function, therefore ECF ration is the ratio of energy correlation functions. The energy correlation functions are defined with the motivation that (N+1) – point correlators are sensitive to N-prong subtractors.
‘d2’ ~ D2
Defined as a double ratio of ECFs, 3-to-2 – point ECF ratio:
‘angularity’ a3
Jet mass (M), width, eccentricity, planar flow and angularity are measured for jets reconstructed using the anti-kt algorithm with distance parameters R=0.6 and 1.0, with transverse momentum pT>300 GeV and pseudorapidity |η|<2.
The formula of angularity is:
‘KtDeltaR ~ kt ΔR
ΔR of two subjects within the large-R jet with the kt splitting scale, this variable is obtained by reclustering the constituents of a jet with kt algorithm, which usually clusters last the harder constituents, and then taking the kt distance measure between the two subject at the final stage of the recombination procedure. As well, ΔR the corresponding angular separation
% pt, between higgs and qcd
histogram(pt,50,'Normalization','probability')
histogram(pts,50,'Normalization','probability')
legend('Background(qcd)','Signal (higgs)')
title('Distrbution of p_{T}')
From this plot we see that they are not the same shape nor do the share the same mean, so the background could be effecting the signal. Therefore this might be a good graph to make cuts to optimize the expected significance. The signal seems to spike around 475, I will make cuts around 474 and 476 to both background and signal to optimize the expected significance.
pt(pt<=473)=[]; %less than
pt(pt>478)=[]; %greater than
N_pt=length(pt)*20000; %background, qcd
pts(pts<=473)=[]; %less than
pts(pts>478)=[]; %greater than
N_pts=length(pts)*100;%higgs, signal
h3=histogram(pt,50,'Normalization','probability');
h4=histogram(pts,50,'Normalization','probability');
legend('background','signal')
title('Cuts: Distribution of p_{T}')
I have done different cuts to maximize the ratio, I am showing you the plot of the final cut since it is the same calulation with different start and end points, the maximzed ratio are wutg these cuts.
% plot eta of background and signal
histogram(eta,50,'Normalization','probability');
histogram(etas,50,'Normalization','probability');
legend('background','signal')
title('Distribution of \eta')
The eta plot seems to show that they have a similar shape as well as sharing the same mean to be around zero, making cuts to this graph may not be necessary.
histogram(phi,50,'Normalization','probability')
histogram(phis,50,'Normalization','probability')
legend('background','signal')
title('Distribution of \phi')
The plot for phi, shows that both the background and signal share a similar pattern and shape, as well as they are distributed evenly around zero.
histogram(ee2,50,'Normalization','probability')
histogram(ee2s,50,'Normalization','probability')
legend('background','singal')
title('Distribution of ee_{2}')
The ee2 plot has such different shapes and mean, it is clear the background noise is effect the signal, doing cuts to optimize the expected significane will help alot. I would cut anything below 0.15 and above 0.18, effectively getting rid of alot of background while keeping a lot of my signal, then replot my distribution with the proper cuts that optimize the expected significance.
ee2(ee2<=0.15)=[]; %less than
ee2(ee2>0.17)=[]; %greater than
N_ee2=length(ee2)*20000; %background, qcd
ee2s(ee2s<=0.15)=[]; %less than
ee2s(ee2s>0.17)=[]; %greater than
N_ee2s=length(ee2s)*100;%higgs, signal
histogram(ee2,50,'Normalization','probability');
histogram(ee2s,50,'Normalization','probability');
legend('background','signal')
title('Cuts: Distribution of ee_{2}')
This is the plot of the mass cut the optimized the ratio, oher caluclations were made but this is the final caluclation.
histogram(ee3,50,'Normalization','probability')
histogram(ee3s,50,'Normalization','probability')
legend('background','singal')
title('Distribution of ee_{3}')
The ee3 plot has the same shape and the spike share the same value as the background. I will not perform cuts to ee3.
% comparing d2 background and signal
histogram(d2,50,'Normalization','probability')
histogram(d2s,50,'Normalization','probability')
legend('background','singal')
title('Distribution of D2')
While D2 has the same type of exponential decay, tand they have the same area of the spike, I do not think the background is affecting the signal enough to want to perform cuts to optimize the expected significance.
histogram(ang,50,'Normalization','probability')
histogram(angs,50,'Normalization','probability')
legend('background','signal')
title('Distribution of angularity')
Angularity plot seems to share the same shape of an exponential decay as well as the same placement of a spike, doing any cuts may be a trivial and will not effect the expected significane too much.
histogram(t1,50,'Normalization','probability')
histogram(t1s,50,'Normalization','probability')
legend('background','signal')
title('Distribution of t_{1}')
The plot is similar enough were I would not perform cuts to optimize the expected significance.
histogram(t2,50,'Normalization','probability')
histogram(t2s,50,'Normalization','probability')
legend('background','signal')
title('Distribution of t_{2}')
The plot shows to very different shapes and mean of signal and background. Therefore I would perform cuts to optimize the expected significance. I will cut anything above 0.3 as everything above is mostly backgorund data, and I can retain as much signal data as possible, as well as the fact the signal data mean is about 0.31. Once I have found the optimized ratio I will replot my data.
t2(t2>0.3)=[]; %greater than
N_t2=length(t2)*20000; %background, qcd
t2s(t2s>0.3)=[]; %greater than
N_t2s=length(t2s)*100;%higgs, signal
histogram(t2,50,'Normalization','probability');
histogram(t2s,50,'Normalization','probability');
legend('background','signal')
title('Cuts: Distribution of t_{2}')
I did multiple cuts, to find the which cuts maximized the ratio, therefore the cut that maximzed the data was 0.3
histogram(t3,50,'Normalization','probability')
histogram(t3s,50,'Normalization','probability')
legend('background','signal')
title('Distribution of t_{3}')
The plot shows to very different shapes and mean of signal and background. Therefore I would perform cuts to optimize the expected significance. I will cut anything about 0.3, as most of it is the background, that is cleary effecting the signal.
t3(t3>0.3)=[]; %greater than
N_t3=length(t3)*20000; %background, qcd
t3s(t3s>0.3)=[]; %greater than
N_t3s=length(t3s)*100;%higgs, signal
histogram(t3,50,'Normalization','probability');
histogram(t3s,50,'Normalization','probability');
legend('background','signal')
title('Cuts: Distribution of t_{3}')
This is the plot for the final cuts that optimizes the ratio.
histogram(t21,50,'Normalization','probability')
histogram(t21s,50,'Normalization','probability')
legend('background','signal')
title('Distribution of t_{21}')
The plot shows to very different shapes and mean of signal and background. Therefore I would perform cuts to optimize the expected significance. I will cut anything above 0.3 as it is mostly the background data. I also would like to stay consitent with the last 2 cuts of t2 and t3
t21(t21>0.3)=[]; %greater than
N_t21=length(t21)*20000; %background, qcd
t21s(t21s>0.3)=[]; %greater than
N_t21s=length(t21s)*100;%higgs, signal
histogram(t21,50,'Normalization','probability');
histogram(t21s,50,'Normalization','probability');
legend('background','signal')
title('Cuts: Distribution of t_{21}')
This is the plot of the final cut that maximizes the ratio.
histogram(t32,50,'Normalization','probability')
histogram(t32s,50,'Normalization','probability')
legend('background','signal')
title('Distribution of t_{32}')
The plot is similar enough were I would not perform cuts to optimize the expected significance.
histogram(kt,50,'Normalization','probability')
histogram(kts,50, 'Normalization','probability')
legend('background','signal')
title('Distribution of k_t\DeltaR')
I will cut this so that I can optimze the expected significance, by getting rid of most of the background data and keeping as much of the signal data as possile. I will start by cutting anything below 0.4 and above 0.6, seeing as the signal spies around 0.5.
kt(kt<=0.4)=[]; %less than
kt(kt>0.9)=[]; %greater than
N_kt=length(kt)*20000; %background, qcd
kts(kts<=0.4)=[]; %less than
kts(kts>0.9)=[]; %greater than
N_kts=length(kts)*100;%higgs, signal
histogram(kt,50,'Normalization','probability');
histogram(kts,50,'Normalization','probability');
legend('background','signal')
title('Cuts: Distribution of k_t\DeltaR')
This the plot of the the cuts that maximze the ratio. Again I did the calulation by trial and error since there are the same aside from changing the start and end values you are seeing the final cuts only.
After doing all the cuts to each of the features I still found that mass cuts had the best significance, therefore, I will plots my other 6 plots using my mass cuts to optimize my data significance features even more.
Luminosity Data
Lab 8: Pseudo-Experiment Data Analysis:
Megan Miyasaki
Partner: Kuan Lee
For this part of the two part lab we will use pseudo-experiment data analyis. Using our optimized event selection, we will hunt for a signal by using one of the pseduo-experiment dataset. For each task below, we will choose one of the observed data from our specific pT sample to perform the analysis.
Below is the code for the high and low luminosity data:
%downloading and importing the data for problem 1:
h5disp("data_highLumi_pt_250_500.h5");
HDF5 data_highLumi_pt_250_500.h5
Group '/'
Attributes:
'TITLE': ''
'CLASS': 'GROUP'
'VERSION': '1.0'
'PYTABLES_FORMAT_VERSION': '2.1'
Group '/data'
Attributes:
'TITLE': ''
'CLASS': 'GROUP'
'VERSION': '1.0'
'pandas_type': 'frame'
'pandas_version': '0.15.2'
'encoding': 'UTF-8'
'errors': 'strict'
'ndim': 2
'axis0_variety': 'regular'
'axis1_variety': 'regular'
'nblocks': 1
'block0_items_variety': 'regular'
Dataset 'axis0'
Size: 14
MaxSize: 14
Datatype: H5T_STRING
String Length: 10
Padding: H5T_STR_NULLTERM
Character Set: H5T_CSET_ASCII
Character Type: H5T_C_S1
ChunkSize: []
Filters: none
FillValue: ' '
Attributes:
'CLASS': 'ARRAY'
'VERSION': '2.4'
'TITLE': ''
'FLAVOR': 'numpy'
'transposed': 1
'kind': 'string'
'name': 'N.'
Dataset 'axis1'
Size: 40344
MaxSize: 40344
Datatype: H5T_STD_I64LE (int64)
ChunkSize: []
Filters: none
FillValue: 0
Attributes:
'CLASS': 'ARRAY'
'VERSION': '2.4'
'TITLE': ''
'FLAVOR': 'numpy'
'transposed': 1
'kind': 'integer'
'name': 'N.'
Dataset 'block0_items'
Size: 14
MaxSize: 14
Datatype: H5T_STRING
String Length: 10
Padding: H5T_STR_NULLTERM
Character Set: H5T_CSET_ASCII
Character Type: H5T_C_S1
ChunkSize: []
Filters: none
FillValue: ' '
Attributes:
'CLASS': 'ARRAY'
'VERSION': '2.4'
'TITLE': ''
'FLAVOR': 'numpy'
'transposed': 1
'kind': 'string'
'name': 'N.'
Dataset 'block0_values'
Size: 14x40344
MaxSize: 14x40344
Datatype: H5T_IEEE_F64LE (double)
ChunkSize: []
Filters: none
FillValue: 0.000000
Attributes:
'CLASS': 'ARRAY'
'VERSION': '2.4'
'TITLE': ''
'FLAVOR': 'numpy'
'transposed': 1
high=h5read("data_highLumi_pt_250_500.h5", '/data/block0_values');
h5disp("data_lowLumi_pt_250_500.h5");
HDF5 data_lowLumi_pt_250_500.h5
Group '/'
Attributes:
'TITLE': ''
'CLASS': 'GROUP'
'VERSION': '1.0'
'PYTABLES_FORMAT_VERSION': '2.1'
Group '/data'
Attributes:
'TITLE': ''
'CLASS': 'GROUP'
'VERSION': '1.0'
'pandas_type': 'frame'
'pandas_version': '0.15.2'
'encoding': 'UTF-8'
'errors': 'strict'
'ndim': 2
'axis0_variety': 'regular'
'axis1_variety': 'regular'
'nblocks': 1
'block0_items_variety': 'regular'
Dataset 'axis0'
Size: 14
MaxSize: 14
Datatype: H5T_STRING
String Length: 10
Padding: H5T_STR_NULLTERM
Character Set: H5T_CSET_ASCII
Character Type: H5T_C_S1
ChunkSize: []
Filters: none
FillValue: ' '
Attributes:
'CLASS': 'ARRAY'
'VERSION': '2.4'
'TITLE': ''
'FLAVOR': 'numpy'
'transposed': 1
'kind': 'string'
'name': 'N.'
Dataset 'axis1'
Size: 4060
MaxSize: 4060
Datatype: H5T_STD_I64LE (int64)
ChunkSize: []
Filters: none
FillValue: 0
Attributes:
'CLASS': 'ARRAY'
'VERSION': '2.4'
'TITLE': ''
'FLAVOR': 'numpy'
'transposed': 1
'kind': 'integer'
'name': 'N.'
Dataset 'block0_items'
Size: 14
MaxSize: 14
Datatype: H5T_STRING
String Length: 10
Padding: H5T_STR_NULLTERM
Character Set: H5T_CSET_ASCII
Character Type: H5T_C_S1
ChunkSize: []
Filters: none
FillValue: ' '
Attributes:
'CLASS': 'ARRAY'
'VERSION': '2.4'
'TITLE': ''
'FLAVOR': 'numpy'
'transposed': 1
'kind': 'string'
'name': 'N.'
Dataset 'block0_values'
Size: 14x4060
MaxSize: 14x4060
Datatype: H5T_IEEE_F64LE (double)
ChunkSize: []
Filters: none
FillValue: 0.000000
Attributes:
'CLASS': 'ARRAY'
'VERSION': '2.4'
'TITLE': ''
'FLAVOR': 'numpy'
'transposed': 1
low=h5read("data_lowLumi_pt_250_500.h5", '/data/block0_values');
High Luminoisty data:
For this part of the labe we will focus on each of our features of our event selection, and plot the observed data with the expected signal and background (normalized to observed yields) without the event selection and then observe the overlap with expected signal and background (normalized to observed yields) with optimal event selection. Lastly we will evaluate observed significance and compare your results to expectation
I have cut 7 out of the 14 features of my data to optimize the ratio. Those features being, mass, pt, ee2, t2, t3,t21, and ktdeltaR; in this part of the lab I will look at only those features:
Since my code redefines all my variables to what they were cut, I will have to redefine what my variables are without the cuts:
Without Optimizing cuts:
These are all the plots of the varaible without the maximizing cuts.
histogram(mass,'Normalization','probability');
histogram(masss,'Normalization','probability');
histogram(hmass,'Normalization','probability');
legend('background','signal','high luminosity')
title('Stacked Histogram: Distribution of Mass')
histogram(pt,'Normalization','probability');
histogram(pts,'Normalization','probability');
histogram(hpt,'Normalization','probability');
legend('background','signal','high luminosity')
title('Stacked Histogram: Distribution of p_{T}')
histogram(ee2,'Normalization','probability');
histogram(ee2s,'Normalization','probability');
histogram(hee2,'Normalization','probability');
legend('background','signal','high luminosity')
title('Stacked Histogram: Distribution of ee_{2}')
histogram(t2,'Normalization','probability');
histogram(t2s,'Normalization','probability');
histogram(ht2,'Normalization','probability');
legend('background','signal','high luminosity')
title('Stacked Histogram: Distribution of t_{2}')
histogram(t3,'Normalization','probability');
histogram(t3s,'Normalization','probability');
histogram(ht3,'Normalization','probability');
legend('background','signal','high luminosity')
title('Stacked Histogram: Distribution of t_{3}')
histogram(t21,'Normalization','probability');
histogram(t21s,'Normalization','probability');
histogram(ht21,'Normalization','probability');
legend('background','signal','high luminosity')
title('Stacked Histogram: Distribution of t_{21}')
histogram(kt,'Normalization','probability');
histogram(kts,'Normalization','probability');
histogram(hkt,'Normalization','probability');
legend('background','signal','high luminosity')
title('Stacked Histogram: Distribution of k_{t}\deltaR')
From all the plots I see that the high luminosity follows the same shape as the background data, So I will take a look at it now with the mass cuts
With Optimal Event Selection
mass(mass<=123)=[]; %less than
mass(mass>127)=[]; %greater than
masss(masss<=123)=[]; %less than
masss(masss>127)=[]; %greater than
hmass(mass<=123)=[]; %less than
hmass(mass>127)=[]; %greater than
histogram(mass,'Normalization','probability');
histogram(masss,'Normalization','probability');
histogram(hmass,'Normalization','probability');
legend('background','signal','high luminosity')
title('Stacked Histogram: Distribution of Mass')
pt(pt<=473)=[]; %less than
pt(pt>478)=[]; %greater than
pts(pts<=473)=[]; %less than
pts(pts>478)=[]; %greater than
hpt(hpt<=473)=[]; %less than
hpt(hpt>478)=[]; %greater than
histogram(pt,'Normalization','probability');
histogram(pts,'Normalization','probability');
histogram(hpt,'Normalization','probability');
legend('background','signal','high luminosity')
title('Stacked Histogram: Distribution of p_{T}')
ee2(ee2<=0.15)=[]; %less than
ee2(ee2>0.17)=[]; %greater than
ee2s(ee2s<=0.15)=[]; %less than
ee2s(ee2s>0.17)=[]; %greater than
hee2(hee2<=0.15)=[]; %less than
hee2(hee2>0.17)=[]; %greater than
histogram(ee2,'Normalization','probability');
histogram(ee2s,'Normalization','probability');
histogram(hee2,'Normalization','probability');
legend('background','signal','high luminosity')
title('Stacked Histogram: Distribution of ee_{2}')
t2(t2>0.3)=[]; %greater than
t2s(t2s>0.3)=[]; %greater than
ht2(ht2>0.3)=[]; %greater than
histogram(t2,'Normalization','probability');
histogram(t2s,'Normalization','probability');
histogram(ht2,'Normalization','probability');
legend('background','signal','high luminosity')
title('Stacked Histogram: Distribution of t_{2}')
t3(t3>0.3)=[]; %greater than
t3s(t3s>0.3)=[]; %greater than
ht3(ht3>0.3)=[]; %greater than
histogram(t3,'Normalization','probability');
histogram(t3s,'Normalization','probability');
histogram(ht3,'Normalization','probability');
legend('background','signal','high luminosity')
title('Stacked Histogram: Distribution of t_{3}')
t21(t21>0.3)=[]; %greater than
t21s(t21s>0.3)=[]; %greater than
ht21(ht21>0.3)=[]; %greater than
histogram(t21,'Normalization','probability');
histogram(t21s,'Normalization','probability');
histogram(ht21,'Normalization','probability');
legend('background','signal','high luminosity')
title('Stacked Histogram: Distribution of t_{21}')
kt(kt<=0.4)=[]; %less than
kt(kt>0.9)=[]; %greater than
kts(kts<=0.4)=[]; %less than
kts(kts>0.9)=[]; %greater than
hkt(hkt<=0.4)=[]; %less than
hkt(hkt>0.9)=[]; %greater than
histogram(kt,'Normalization','probability');
histogram(kts,'Normalization','probability');
histogram(hkt,'Normalization','probability');
legend('background','signal','high luminosity')
title('Stacked Histogram: Distribution of k_{t}\deltaR')
From my plots I was able to cut out a lot of background data as well as signal data that could have been effected by the background data, and the luminoisty was cut in suit with the mass cuts. Our expected significnace was 0.707 from the ratio, our observed is 910 from our mass cuts.
Low Luminosity data:
We will repeat the same steps as we did with our high luminoisty data:
Since my code redefines all my variables to what they were cut, I will have to redefine what my variables are without the cuts:
Without Optimizing cuts:
These are all the plots of the varaible without the maximizing cuts.
histogram(mass,'Normalization','probability');
histogram(masss,'Normalization','probability');
histogram(lmass,'Normalization','probability');
legend('background','signal','low luminosity')
title('Stacked Histogram: Distribution of Mass')
histogram(pt,'Normalization','probability');
histogram(pts,'Normalization','probability');
histogram(lpt,'Normalization','probability');
legend('background','signal','low luminosity')
title('Stacked Histogram: Distribution of p_{T}')
histogram(ee2,'Normalization','probability');
histogram(ee2s,'Normalization','probability');
histogram(lee2,'Normalization','probability');
legend('background','signal','low luminosity')
title('Stacked Histogram: Distribution of ee_{2}')
histogram(t2,'Normalization','probability');
histogram(t2s,'Normalization','probability');
histogram(lt2,'Normalization','probability');
legend('background','signal','low luminosity')
title('Stacked Histogram: Distribution of t_{2}')
histogram(t3,'Normalization','probability');
histogram(t3s,'Normalization','probability');
histogram(lt3,'Normalization','probability');
legend('background','signal','low luminosity')
title('Stacked Histogram: Distribution of t_{3}')
histogram(t21,'Normalization','probability');
histogram(t21s,'Normalization','probability');
histogram(lt21,'Normalization','probability');
legend('background','signal','low luminosity')
title('Stacked Histogram: Distribution of t_{21}')
histogram(kt,'Normalization','probability');
histogram(kts,'Normalization','probability');
histogram(lkt,'Normalization','probability');
legend('background','signal','low luminosity')
title('Stacked Histogram: Distribution of k_{t}\deltaR')
The low luminoisty plots were all different and did not seem to follow a pattern, hopefully after applying the mass cuts I can see more.
With Optimal Event Selection
mass(mass<=123)=[]; %less than
mass(mass>127)=[]; %greater than
masss(masss<=123)=[]; %less than
masss(masss>127)=[]; %greater than
lmass(lmass<=123)=[]; %less than
lmass(lmass>127)=[]; %greater than
histogram(mass,'Normalization','probability');
histogram(masss,'Normalization','probability');
histogram(lmass,'Normalization','probability');
legend('background','signal','low luminosity')
title('Stacked Histogram: Distribution of Mass')
pt(pt<=473)=[]; %less than
pt(pt>478)=[]; %greater than
pts(pts<=473)=[]; %less than
pts(pts>478)=[]; %greater than
lpt(lpt<=473)=[]; %less than
lpt(lpt>478)=[]; %greater than
histogram(pt,'Normalization','probability');
histogram(pts,'Normalization','probability');
histogram(lpt,'Normalization','probability');
legend('background','signal','low luminosity')
title('Stacked Histogram: Distribution of p_{T}')
ee2(ee2<=0.15)=[]; %less than
ee2(ee2>0.17)=[]; %greater than
ee2s(ee2s<=0.15)=[]; %less than
ee2s(ee2s>0.17)=[]; %greater than
lee2(lee2<=0.15)=[]; %less than
lee2(lee2>0.17)=[]; %greater than
histogram(ee2,'Normalization','probability');
histogram(ee2s,'Normalization','probability');
histogram(lee2,'Normalization','probability');
legend('background','signal','low luminosity')
title('Stacked Histogram: Distribution of ee_{2}')
t2(t2>0.3)=[]; %greater than
t2s(t2s>0.3)=[]; %greater than
lt2(lt2>0.3)=[]; %greater than
histogram(t2,'Normalization','probability');
histogram(t2s,'Normalization','probability');
histogram(lt2,'Normalization','probability');
legend('background','signal','low luminosity')
title('Stacked Histogram: Distribution of t_{2}')
t3(t3>0.3)=[]; %greater than
t3s(t3s>0.3)=[]; %greater than
lt3(lt3>0.3)=[]; %greater than
histogram(t3,'Normalization','probability');
histogram(t3s,'Normalization','probability');
histogram(lt3,'Normalization','probability');
legend('background','signal','low luminosity')
title('Stacked Histogram: Distribution of t_{3}')
t21(t21>0.3)=[]; %greater than
t21s(t21s>0.3)=[]; %greater than
lt21(lt21>0.3)=[]; %greater than
histogram(t21,'Normalization','probability');
histogram(t21s,'Normalization','probability');
histogram(lt21,'Normalization','probability');
legend('background','signal','low luminosity')
title('Stacked Histogram: Distribution of t_{21}')
kt(kt<=0.4)=[]; %less than
kt(kt>0.9)=[]; %greater than
kts(kts<=0.4)=[]; %less than
kts(kts>0.9)=[]; %greater than
lkt(lkt<=0.4)=[]; %less than
lkt(lkt>0.9)=[]; %greater than
histogram(kt,'Normalization','probability');
histogram(kts,'Normalization','probability');
histogram(lkt,'Normalization','probability');
legend('background','signal','low luminosity')
title('Stacked Histogram: Distribution of k_{t}\deltaR')
For our low luminosity I was able to do the same as I did with our high luminosity, our expected significnace was 0.707 from the ratio, our observed is 910 from our mass cuts.
95% Confidence Level of signal yields:
Here we will find in the low luminosity data, the observed significance is less than 5σσ and then we will calculate the 95% confidence level upper limit of signal yield. Next we will evaluate the expected 95% confidence level upper limit. We will then evaluate the observed 95% confidence level upper limit. Lastly we will compare expectation to observation.
Since all my cuts are made off of the mass cuts, I will use the mass cuts signal yields
The code below helps me to find the 95% confidence upper bound for luminoisty, expected and observed data.
back=fitdist(background(4,:)','Poisson');
sig=fitdist(signal(4,:)','Poisson');
lum=fitdist(low(4,:)','Poisson');
From these results we see that the upper limit for the background is 97, for the singal is 114 and for the low luminosity its 97. From this we see the low luminosity and the background are similar, while the signal has a high confidence level. These confidence intervals are for the mass feature of the data.